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ABSTRACT 

This  paper  concerns  the  distribution  of  the  sum  of  k 
Largest  observations  in  a sample  of  m observations  from  a 
gamma  distribution  with  n degrees  of  freedom.  The  density 
and  cdf  of  the  distribution  are  given  as  a sum  of  gamma 
density  functions.  If  n is  integer  valued  then  the  sum 
consists  of  a finite  number  of  terms.  The  distribution  of 
the  sum  arises  in  a problem  of  selecting  variables  in  a 
multiple  regression  analysis. 
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2. 

1 . Introduction. 

The  distribution  of  the  sum  of  k largest  (smallest) 
order  statistics  arises  in  various  statistical  investigations. 

In  life  testing,  for  example,  suppose  that  m items  are  put 
on  trial  and  that  the  experiment  is  terminated  when  k < m 
items  fail.  Let  the  length  of  life  of  the  items  be  indepen- 
dently and  identically  distributed  according  to  a gamma  dis- 
tribution with  an  unknown  scale  parameter  0,  say. 

If  X.  < X.  < ...<  X.  denote  the  observed  failure  times  then 
1 - 2 - — k 

V i X.,  representing  the  sum  of  k smallest  order  statistics, 

together  with  X^  is  a sufficient  statistic  for  0.  If 

Z,  < Z-  <...  < Z,  denote  the  observed  failure  times  for 
another  set  of  items  for  which  the  scale  parameter  is  0',  say, 

then  the  ratio  R = X^/  Z^  may  be  used  to  test  the 

hypothesis  H:  0 * 0'.  Note  that  the  distribution  of  R does 
not  depend  on  the  value  of  the  scale  parameter,  under  H. 

For  another  example,  suppose  that  m customers  are  wait- 
ing in  a queue  for  service.  In  certain  situations,  it  may  be 
desirable  to  divert  k of  the  customers  who  are  likely  to  take 
individually  longer  servicing  time  compared  to  the  remaining 
customers,  to  a special  queue.  Let  the  servicing  time  of  the 

> 

customers  be  independently  and  identically  distributed.  Then 
the  total  servicing  time  of  the  special  queue  represents  the 
sum  of  k largest  order  statistics,  whose  distribution  would 


be  of  interest. 
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Let  Yj.  denote  the  sum  of  k largest  values  in  a sample  of  n 
observations  from  a gamma  distribution  with  n degrees  of 
freedom.  In  this  paper  we  show  that  the  cumulative  distri- 
bution function  (cdf)  of  Y^  can  be  expressed  as  a linear 
function  of  the  gamma  distribution  functions.  If  n is  a pos- 
itive integer  then  the  linear  function  consists  of  a finite 
number  of  terms.  The  distribution  of  Y^  is  obtained  by  invert- 
ing its  Laplace  transform.  The  distribution  of  the  sum  of 
k smallest  values  in  the  sample  is  obtained  similarly. 

For  an  application  of  the  given  result  consider  the 
problem  of  determining  the  distribution  of  sample  multiple 
correlation  in  regression  analysis,  where  k variables 
are  selected  for  inclusion  in  the  regression  equation  from  a 
given  set  of  m variables,  which  maximize  the  value  of  R^. 

Suppose  that  the  variables  are  jointly  normally  distributed 

2 

and  independent.  It  is  shown  in  Section  3 that  (M-l)R^  is 

asymptotically  distributed  for  large  M (sample  size)  as  the 

sum  of  k largest  order  statistics  in  a sample  of  m obser- 

2 

vations  from  a chi-square  (\“)  distribution  with  1 degree 


of  freedom. 


4. 


2.  Distribution  of  Y^.  Let  X^.  denote  the  r-th  smallest  value  in  a sanple 
of  m observations  from  a gamma  distribution  with  n degrees  of  freedom, 
and  let  Y^  =1  ^ denote  the  sum  of  the  m largest  observations  in 

the  sample.  Let  f^(x)  and  F^(x)  denote  the  density  and  odf  of  Y^  , 
respectively,  and  let  L^(0)  denote  the  Laplace  transform  of  the  distri- 
bution. The  density  and  cdf  are  obtained  by  inverting  1^(6) , as  follows. 

Let  Y be  distributed  according  to  the  gamma  distribution  with  n 

degrees  of  freedom,  and  let  gn(x)  and  Gn(x)  denote  its  density  and  cdf, 

respectively.  The  density  function  and  the  Laplace  transform  of  the  dis- 
tribution are  given  by 

gn(x)  = xn~  e"X/r(n)  , x > o 


e“  0X  d G (x)  = (1+0) ~n 
n 


6 > o • 


(2.1) 


Let  4>x(6)  denote  the  Laplace  transform  of  the  conditional  distribution 
of  Y,  given  Y ^ x,  where  x ^ o.  We  have 

4>x  (6)  = (1-Gn(x))'1j^  e'6y  dGn(y) 

= (1+0) 'n  (1-G  ( (1+0)  x) ) (1-G  (x) ) -1. 

n n 


Let  H (x)  denote  the  cdf  of  X , . Given  X , = x,  y,  is  distributed 

m-k  m-k  k 

as  the  sum  of  k independent  observations  from  the  conditional  distribution 
of  Y,  given  Y > x.  Therefore 


he  (0)  =f  *x(0)  d H(x) 

= m (V>px  (t3)  Gnm"k"1  (x)  Gn(x) 

m^1)  (l+0)“nkf  (1-Gn  ( (l+0)x)))\;  m_k‘1(x)dG  (x) 
_ k Jo  n n n 

1 k<  m 

(1+6) “mn  k = m.  (2.2) 


The  Laplace  transform  of  the  distribution  of  the  sum  of  k 
smallest  observations  in  the  sample  is  obtained  by  substitut- 

“***  1 — — . 
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ing  G ((l+6)x)  for  l-G((l+0)x)  and  1-G  (x)  for  G (x) 
n it  n n 

in  the  right  hand  side  of  (2.2). 

First  we  consider  the  special  case  when  n is  a positive  integer. 
Let  c^  denote  the  coefficient  of  xu  in  the  expansion  of 

n~l  a 
(‘a-o  oT  1 


for  non-regative  integer  values  of  u and  v.  The  numbers  c^  can  be 
carputed  recursively  from  the  following  formula. 


c = 
uv  u! 


u < n-1 


cul  = ° ' 


u > n 


c = o 
uv 


u >(n-l)v 


c = £ 
uv 


n-1 


a = o a! 

From  (2.2)  and  the  formula 


1 c . n < u < (n-l)v,  v>l. 

— u-a , v-1  , - - 


, „ . . -x  _ n-1  x 

1-G  (x)=e  I . — r 

n ct=o  a! 

we  have  after  simplification  for  1 <_  k <m 

t / — m n*-l  m-k-1  (n-1) k (n-l)r  , ,.r 

L (0)  - JTI_  , ) y y (-1)  C C 

k T(n)  1 k ' Lr=o  Lu=o  v = o ^ ^ 

(m-k“1)  r (u+v+n)  (k+l+r) -u_v-n  (1+Q)_nk+U  (l+c^e  )"u-v_n 

where  ct^.  = k / (1+r+k) . Through  decomposition  into  partial  fractions  we 
have 


<i+er"lwu  u-K.roru-v-n  - j.1*-11-1 


u+v+n-1 


+Z 


s=o  (1+0) 


nk-u-s  s=o 


(1+a  0 
r 


(2.3) 


u+v+n-s 


(2.4) 
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where 


a.  - (-a  (1-a  ) -u-v-n-s 

b i 5 L 

b ^_^nk-u  (n^-s+s~^)  (—  -1)  -n^+u_s  a~ 


By  (2.1)  the  right  hand  side  of  (2.4)  is  the  Laplace  transform  of  the 


function. 


* . . _nk-u-l  , -l„u+v+n-l  , ,x  , , 0 n 

9 n,v(x>  ’ “s=o  Vnk-u-s  <l0+c,r  Ea*>  ‘sWs1^'  (2'SI 


Therefore,  by  inverting  (2.3)  we  get 

c /vi  3 IB | n>-l\  r.m~k— 1 ^(n~ l)k  _(n~l)r  , i j t 

Vx'  r (n)  1 k °r=o  “u=o  v=«  ' cuk  vr 


("*■*“1)  r (u+v+n)  (k+l+r) -u-v-n  g* 


(2.6) 


9 (x)  • 
ruv 


ri!  (t  t r(nk-m-f-t)  r(n+u+v*t) 
P . "t=o  Var  T(nk-u)  r(n+u+v) 


where  i is  a positive  integer.  Frcm  (2.3)  we  obtain  the  f-th  manent  of 


Yk,  given  by 
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P „ m m-1.  ^jn-k-l^n-Uk  r(n-l)r  , ,.r 

Yk  T (n)  k ~r=»o  “ u=»o  “ v=o  cuk  Cvr 

r*~l)  Hu+v+n)  lkH«)'U-V-nptw(.  (2.7) 

Fear  n = 1 and  4*  1,  the  formula  (2.7)  checks  with  the  known  result 
(see  e.g.  David  (1970)  2.7.3) 

E \ -E  T-rwkn  S1-  1 

Next,  we  consider  the  general  case  when  n is  not  a positive  integer. 
The  case  n = 1/2  is  of  special  interest,  as  in  the  example,  described  in  the 
previous  section.  Let 


4>(a,b;x)  - 1+1, 


<a)r  xr 
r * 1 (b)  r rl 


(a)^  = a(a+l)..  . (a+r-1) 


denote  the  confluent  hypergecmetric  function,  and  let 

r 


$s(a,b;x)  = Z 


d ^ 

r = o rs  r! 


(2.8) 


where  s is  a positive  integer.  Differentiating  (2.8)  with  respect 
to  x and  using  the  formula  for  the  derivative  of  a confluent 

hypergeometric  function,  given  by 
*(a,b;x)  = | *(a+l,b+l;x) 


we  get 


^ * (a,b;x) $(a+l,b+l;x) 


i: 


-r=o  r+1  s r!  * 

Equating  the  coefficient  of  xr  both  sides  we  obtain  a recursive  relation 
for  the  coefficients  d^  , given  by  d^  = 1,  drl  = (a)  (b)  ^ 


r+1  s 


« ^ 1 JL(f) 


(a+l) 


r-t 


b - t=ox  (b+1)  r_t  t s-1' 


s > 1 


(2.9) 
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The  above  formula  will  be  used  below  for  the  special  case  in 

which  a = n and  b * n + 1.  In  this  case  (2.9)  simplifies  to 

-1 


dr.l  s * ns 


We  have 


Wt’  3. 


I' 


s > 1. 


Gn(x)  = Cl  gn+r(x) 


Xn  — v 

x e »(l,n+l;x) 


T(n+1) 


x 

F (n+lV 


$ (n,n+l;-x) . 


(2.10) 


'The  last  step  in  (2.10)  follows  from  the  relation  $ (b-a,bj-x)  = 


e~Xi(a,b;x) . Using  (2.10)  in  (2.2)  we  get  for  1 < k < m 

1^(0)  * (J)  (1+0) "nk  /“  (1-Gn  ((l-H3)x))kdGnm"k(x) 

- k 0 d+e)_nk  r^k^)a-GnM  )k_1g  (X)dx 


■'on  vl+0'  n' 


k 0 


(I*  (n+1) 


. ’ dr  nw'-  Vt  (1+e) 

m-k  r=o  r m-*.  rk 


-mn-r 


(2.11) 


where 


l . * r Xn(m_k)+r  (l-G  (x) )k_1g  (x)dx. 
rk  ; o n n 

Inverting  (2.11)  we  obtain  the  distribution  of  Y , given  by  the  density  function 


£k(x>  = 


(r(n+l)) 


— r (-1) rd  t . g (x).  (2.12) 

m-k  r=o  r m-k  rk  mn+r 


9. 


r 


Table  I be  lew  shows,  for  illustration,  the  upper  90th  and  9 5 th 

percentiles  of  the  distribution  of  Y^_  for  certain  values  of 

m,k  and  n.  Since  Y^  represents  the  largest  order  statistic 

in  a sample  of  size  m,  and  Y is  distributed  as  a gamma 

random  variable  with  mn  degrees  of  freedom,  the  percentiles 

of  Y,  for  k = 1 and  m can  be  obtained  from  available  tables 
k 

of  the  gamma  distribution  and  the  distribution  of  the  largest 
order  statistic  from  that  distribution.  Percentage  points 
of  the  distribution  of  order  statistics  from  the  gamma  dis- 
tribution have  been  tabulated  by  Gupta  (1960).  The  percentage 
points  of  Y^  given  in  the  table  agree  with  the  corresponding 
points  give  in  Table  III  of  Gupta. 
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3.  Asymptotic  Distribution  of  R^. 

Let  X ^ , . . . , X,^  denote  a given  set  of  m predictor  variables 
and  Y denote  the  predictand  in  a multiple  regression  problem, 
where  the  variables  are  jointly  normally  distributed.  Specifi- 
cally, let  ( Y , X. , . . . , X )'  d N(u,E),  where  d means  "dis- 
1 m s z 

tributed  as".  Since  we  are  concerned  with  the  correlation 
coefficient,  we  can  assume  without  loss  of  generality 

that  o and  that  E is  a correlation  matrix.  Let  Y and  X^  denote  the 
vector  of  deviations  of  the  observed  values  of  Y and  X^  from  their 
respective  means  in  a sample  of  M observations  obtained  from  the  given 
distribution.  Consider  a subset  of  the  predictor  variables,  say,  X^,...,X^. 
Let  X = (X^, . . . ,X^) . The  square  of  the  sample  multiple  correlation  between 
Y and  (X^,...,^)  is  given  by 

R2  = (y'  X (x'xrVY)  / (YY). 

By  the  law  of  large  numbers 

(M-l)  (x'x)-1  P ^ l”1  as  M + ■» 

where  E^  denotes  the  correlation  matrix  of  the  predictor  variables 
X ^ , . . . , X^ . Therefore , asymptotically 

(M-l) R2  f (y'  X E ^ X Y)  / (Y'Y)  (3.1) 

Let  R^  denote  the  largest  sample  multiple  co-relation  among  (™) 

correlations  between  Y and  k out  of  m predictor  variables. 

Let  V.  = (y'x.)2/  (y'y),  and  let  S.  denote  the  sum  of  k largest 

i - ~i  - ~ K 

values  among  V. , . . . ,V  . Let  Cor  (Y,X.)  = p and  Cor  (X.  ,X.)  =A  for 
1 m l i j 

i,  j = 1, . . . ,m  and  i + j.  That  is,  the  predictor  variables  are  equi- 


correlated  with  themselves  and  with  the  predictor  variables.  The  two 
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distinct  characteristic  roots  of  Z ^ are  ( 1— X ) ^ and  (l+(k-l)X)  ^ . 

Let  X*  and  X denote,  respectively,  the  minimum  and  maximum  of  the  t 
values.  The  quantity  cm  the  right  hand  side  of  (3.1)  lies  between 

A u and  X*Ek  ,V..  Therefore,  the  asymptotic  distribution  of  (M-l)R? 

* i=i  i i=l  i * 

★ 

is  minorized  (majorized)  by  the  distribution  of  X*Sk(X  S^) . 

Let  X=  o.  Then  X*  = X*  = 1.  Given  Y,  V^2  = (YX^/^'t)  1/2  * 

N(p(YY)172,  1-p2)  and  Cor(V*/2,V^72)  = -P2.  Therefore 

(V^72 vj;72)  ~ (1^+pW172 U^PW172)  (3.2) 

where  W ^ y2  and  U1,...,Um  are  jointly  normally  distributed  independent 
of  W,  with  mean  zero  and  covariance,  given  by 

Var  (lb)  = 1-P2,  Cov  (U^U.)  =-p2. 

Thus  for  large  M we  have 

with  the  distribution  of  being  given  by  (3.2)  . If  moreover,  p = o 

then  is  distributed  as  the  sum  of  k largest  values  in  a sample  of 

m observations  from  a chi-square  distribution  with  1 degree  of  freedom 

Diehr  and  Hoflin  (1974)  have  given  an  empirical  formula  for  the 

2 

percentage  points  of  the  distribution  of  for  the  case  X = p = o. 

2 

Fran  Table  1 of  their  paper  we  obtain  the  90%  and  95%  points  of  (M-l) 
for  m = 5,  k = 1,2,3,  M = 106,  as  shown  below 

k=l  2 3 

90%  point  5.25  7.56  8.50 

95%  point  6.51  9.14  10.18 


f 
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The  above  figures  are  slightly  larger  (as  they  should  be)  than 
the  corresponding  percentage  points  of  2Y^  for  m =•  4,  given  in 
Table  1 below. 
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